Mining Very Large Datasets with Svm and Visualization

نویسنده

  • Author
چکیده

We present a new support vector machine (SVM) algorithm and graphical methods for mining very large datasets. We develop the active selection of training data points that can significantly reduce the training set in the SVM classification. We summarize the massive datasets into interval data. We adapt the RBF kernel used by the SVM algorithm to deal with this interval data. We only keep the data points corresponding to support vectors and the representative data points of non support vectors. Thus the SVM algorithm uses this subset to construct the non-linear model. We also use interactive graphical methods for trying to explain the SVM results. The graphical representation of IF-THEN rules extracted from the SVM models can be easily interpreted by humans. The user deeply understands the SVM models’ behaviour towards data. The numerical test results are obtained on real and artificial datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incremental SVM and Visualization Tools for Bio-medical Data Mining

Most of the bio-data analysis problems process datasets with a very large number of attributes and few training data. This situation is usually suited for support vector machine (SVM) approaches. We have implemented a new column-incremental linear proximal SVM to deal with this problem. Without any feature selection step, the algorithm can deal with very large datasets (at least 10 attributes) ...

متن کامل

Mining Very Large Datasets with SVM and Visualization

We present a new support vector machine (SVM) algorithm and graphical methods for mining very large datasets. We develop the active selection of training data points that can significantly reduce the training set in the SVM classification. We summarize the massive datasets into interval data. We adapt the RBF kernel used by the SVM algorithm to deal with this interval data. We only keep the dat...

متن کامل

Enhancing SVM with Visualization

Understanding the result produced by a data-mining algorithm is as important as the accuracy. Unfortunately, support vector machine (SVM) algorithms provide only the support vectors used as “black box” to efficiently classify the data with a good accuracy. This paper presents a cooperative approach using SVM algorithms and visualization methods to gain insight into a model construction task wit...

متن کامل

Towards High Dimensional Data Mining with Boosting of PSVM and Visualization Tools

We present a new supervised classification algorithm using boosting with support vector machines (SVM) and able to deal with very large data sets. Training a SVM usually needs a quadratic programming, so that the learning task for large data sets requires large memory capacity and a long time. Proximal SVM proposed by Fung and Mangasarian is another SVM formulation very fast to train because it...

متن کامل

A Simple, Fast Support Vector Machine Algorithm for Data Mining

Support Vector Machines (SVM) and kernel related methods have shown to build accurate models but the learning task usually needs a quadratic programming, so that the learning task for large datasets requires big memory capacity and a long time. A new incremental, parallel and distributed SVM algorithm using linear or non linear kernels proposed in this paper aims at classifying very large datas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004